Method and System for Video Quality Measurements

ABSTRACT

A method of measuring a quality of a test video stream, the method comprising measuring a content richness fidelity feature of the test video stream based on occurrences of color values in image frames of the test video stream; measuring a block-fidelity feature of the test video stream based on distortion at block-boundaries in the image frames of the test video stream; measuring a distortion-invisibility feature of the test video stream based on distortion at pixels of the image frames of the test video stream; and determining a quality rating for the test video stream based on the content richness fidelity feature, the block-fidelity feature and the distortion-invisibility feature measured.

FIELD OF INVENTION

The present invention relates broadly to a method and system formeasuring a quality of a test video stream.

BACKGROUND

In image and video manipulations, apart from on-line and off-line visualquality evaluation, how to gauge distortion also plays a determinativerole in shaping most algorithms, such as enhancement, reconstruction,data hiding, compression, and joint source/channel coding. Visualquality control within an encoder and distortion assessment for thedecoded signal are particularly of interest due to the widespreadapplications of H.26x/MPEG-x compression and coding. Since human eyesare the end receiver of most decoded images and video, it is desirableto develop visual quality metrics that correlate better with human eyes'perception than the conventional pixel-wise error (e.g.,mean-squared-error (MSE), peak signal-to-noise ratio (PSNR)) measures.

Perceptual models based upon human vision characteristics have beenproposed. In one such metric proposal the colour-transformed originaland decoded sequences are subjected to blocking and Discrete CosineTransform (DCT), and the resultant DCT coefficients are then convertedto the local contrast, which is defined as the ratio of the AC amplitudeto the temporally low-pass filtered DC amplitude. A temporal recursivediscrete second-order IIR filtering operation follows to implement thetemporal part of the contrast sensitivity function (CSF). The resultsare then converted to measures of visibility by dividing eachcoefficient by its respective visual spatial threshold. The differenceof two sequences is subjected to a contrast masking operation, andfinally the masked difference is pooled over various dimensions toillustrate perceptual error.

With the same paradigm, another approach termed Winkler's metricconsists of colour conversion, temporal filters, spatial subbandfilters, contrast control, and pooling for various channels, which arebased on the spatio-temporal mechanisms in the human visual system. Thedifference of original and decoded video is evaluated to give anestimate of visual quality of the decoded signal. The metric'sparameters are determined by fitting the metric's output to theexperimental data on human eyes.

Prevalent visual coding schemes (e.g., DCT- or wavelet-based) introducespecific types of artefacts such as blockiness, ringing and blurring.The metrics in such coding may evaluate blocking artefacts as thedistortion measure. Other metrics measure five types of error (i.e.,low-pass filtered error, Weber's law and CSF corrected error, blockingerror, correlated error, and high contrast transitional error), and usePrincipal Component Analysis to decide the compound effect on visualquality.

Switching between a perceptual model and a blockiness detector dependingon the video under test has also been suggested.

Another proposed perceptual distortion metric architecture consists ofopponent colour conversion, perceptual decomposition, masking, followedby pooling. In his method, the spatial frequency andorientation-selective filtering and temporal filtering are performed inthe frequency (spectral) domain. The behaviour of the human visionsystem is modelled by cascading a 3-D filter bank and the non-lineartransducer that models masking. The filter bank used in one proposedmodel is separable in spatial and temporal directions. The modelfeatures 17 Gabor spatial filters and 2 temporal filters. A non-lineartransducer modelling of masking has been utilized. In a simplifiedversion, the perceptual model is applied to blockiness dominant regions.

A software tool for measuring the perceptual quality of digital stillimages has been provided in the market. Five proprietary full referenceperceptual metrics, namely blockiness, blurriness, noise, colourfulnessand a mean opinion score have been developed. However, since thesemethods are proprietary, there are no descriptions available of howthese metrics' outputs are being calculated.

A full reference video quality metric has also been proposed. For eachframe, corresponding local areas are extracted from both the originaland test video sequences respectively. For each selected local area,statistical features such as mean and variance are calculated and usedto classify the local area into smooth, edge, or texture region. Next alocal correlation quality index value is calculated and these localmeasures are averaged to give a quality value of the entire frame. Theframe quality value is adjusted by two factors: the blockiness factorand motion factor. The blockiness measurement is evaluated in the powerspectrum of the image signal. This blockiness measure is used to adjustthe overall quality value only if the frame has relatively high qualityindex value but severe blockiness. The motion measurement is obtained bya simple block-based motion estimation algorithm. This motion adjustmentis applied only if a frame simultaneously satisfies the conditions oflow quality index value, high blurriness and low blockiness. Finally,all frame quality index values are averaged to a single overall qualityvalue of the test sequence.

SUMMARY

In accordance with a first aspect of the present invention there isprovided a method of measuring a quality of a test video stream, themethod comprising measuring a content richness fidelity feature of thetest video stream based on occurrences of color values in image framesof the test video stream; measuring a block-fidelity feature of the testvideo stream based on distortion at block-boundaries in the image framesof the test video stream; measuring a distortion-invisibility feature ofthe test video stream based on distortion at pixels of the image framesof the test video stream; and determining a quality rating for the testvideo stream based on the content richness fidelity feature, theblock-fidelity feature and the distortion-invisibility feature measured.

The measuring of the content richness fidelity feature of the test videostream may be based on a sum of products for each colour and each frame,wherein each product comprises the product of a probability ofoccurrence of said each colour in said each image frame, and alogarithmic of said probability.

The measuring of the block-fidelity feature of the test video stream maybe based on distortion at 4-pixel sub-block boundaries.

The measuring of the content richness fidelity feature may be based onoccurrences of color values in corresponding image frames of the testvideo stream and an original video stream from which the test videostream has been derived.

The measuring of the block-fidelity feature of the test video stream maybe based on distortion at block-boundaries in corresponding image framesof the test video stream and the original video stream from which thetest video stream has been derived.

The measuring of the distortion-invisibility feature of the test videostream may be based on a visibility threshold value.

The visibility threshold value may be based on one or more maskingeffects determined for the test video stream and the original videostream.

The masking effects may comprise one or more of a group consisting ofcolour masking, temporal masking and spatial-textural masking.

The measuring of the distortion-invisibility feature of the test videostream is based on distortion at pixels of corresponding image frames ofthe test video stream and the original video stream from which the testvideo stream has been derived.

The measuring of the distortion-invisibility feature of the test videostream is based on processing of current and previous image frames ofthe test video stream and the original video stream.

In accordance with a second aspect of the present invention there isprovided a system for measuring a quality of a test video stream, thesystem comprising means for measuring a content richness fidelityfeature of the test video stream based on occurrences of color values inimage frames of the test video stream; means for measuring ablock-fidelity feature of the test video stream based on distortion atblock-boundaries in the image frames of the test video stream; means formeasuring a distortion-invisibility feature of the test video streambased on distortion at pixels of the image frames of the test videostream; and means for determining a quality rating for the test videostream based on the content richness fidelity feature, theblock-fidelity feature and the distortion-invisibility feature measured.

In accordance with a third aspect of the present invention there isprovided a system for measuring a quality of a test video stream, thesystem comprising a color processor measuring a content richnessfidelity feature of the test video stream based on occurrences of colorvalues in image frames of the test video stream; a distortion processormeasuring a block-fidelity feature of the test video stream based ondistortion at block-boundaries in the image frames of the test videostream and measuring a distortion-invisibility feature of the test videostream based on distortion at pixels of the image frames of the testvideo stream; and a quality rating processor determining a qualityrating for the test video stream based on the content richness fidelityfeature, the block-fidelity feature and the distortion-invisibilityfeature measured.

In accordance with a fourth aspect of the present invention there isprovided a computer readable data storage medium having stored thereonprogram code means for instructing a computer to execute a method ofmeasuring a quality of a test video stream, the method comprisingmeasuring a content richness fidelity feature of the test video streambased on occurrences of color values in image frames of the test videostream; measuring a block-fidelity feature of the test video streambased on distortion at block-boundaries in the image frames of the testvideo stream; measuring a distortion-invisibility feature of the testvideo stream based on distortion at pixels of the image frames of thetest video stream; and determining a quality rating for the test videostream based on the content richness fidelity feature, theblock-fidelity feature and the distortion-invisibility feature measured.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention will be better understood and readilyapparent to one of ordinary skill in the art from the following writtendescription, by way of example only, and in conjunction with thedrawings, in which:

FIG. 1 is a schematic drawing illustrating a 7×7 mask utilised indetermining the average background luminance around a pixel, inaccordance with an example embodiment.

FIG. 2 is a schematic drawing illustrating 7×7 masks utilised incalculating the average luminance around a pixel, according to anexample embodiment.

FIG. 3 is a schematic drawing illustrating a 7×7 low-pass filterutilised in reducing the spatial-textural masking at edge locations, inaccordance with an example embodiment.

FIG. 4 is a flow-chart illustrating the computing of the block-fidelityfeature in an example embodiment.

FIG. 5 is a flow-chart illustrating the computing of the contentrichness fidelity in an example embodiment.

FIG. 6 is a flow-chart illustrating the process for computingdistortion-invisibility in an example embodiment.

FIG. 7 is a schematic drawing of a video quality measurement system,according to an example embodiment.

FIG. 8 is a flow-chart illustrating the process for computing theoverall video quality measure in an example embodiment.

FIG. 9 shows a scatterplot of subjective ratings versus ratings obtainedusing a prior art visual quality metrics.

FIG. 10 shows a scatterplot of subjective ratings versus the videoquality ratings obtained in accordance with an example embodiment.

FIG. 11 is a schematic drawing of a computer system for implementing themethod and system according to an example embodiment.

DETAILED DESCRIPTION

The example embodiment described comprises an objective video qualitymeasurement method to automatically measure the perceived quality of astream of video images. The method is based on a combined measure ofdistortion-invisibility, block-fidelity, and content richness fidelity.

The example embodiment seeks to provide an automatic and objective videoquality measurement method that is able to emulate the human vision todetect the perceived quality of a video stream. Traditionally, videoquality is performed via a subjective test where a large number of humansubjects are used to gauge the quality of a video but this process isnot only time-consuming, but tedious and expensive to perform. Theexample embodiments seek to replace the need of a subjective test inorder to be able to gauge the perceived quality of a video stream.

Generally, the example embodiment consists of computation of a videoquality rating using a video quality model made up of the followingcomponents: (1) content richness fidelity (FRF), (2) block-fidelity(F_(BF)), and (3) distortion-invisibility (D).

The content richness fidelity feature measures the fidelity of therichness of a test video's content with respect to the original(undistorted) reference video. This content richness fidelity featuregives higher values for a test video which has better fidelity incontent richness with respect to the original (undistorted) referencevideo.

The block-fidelity feature measures the amount of distortion atblock-boundaries in the test video when compared with respect to theoriginal (undistorted) reference video. The block-fidelity featureshould give lower values when distortion at block-boundaries in the testvideo is more severe and higher values when distortion is very low ordoes not exist in the test video (when compared to the original(undistorted) reference video).

The distortion-invisibility feature measures the average amount ofdistortion that may be visible at each pixel with respect to avisibility threshold and gives higher values for lower visibledistortions and lower values for higher visible distortions.

A combined measure is proposed and then demonstrated to measure visualquality for video With the video-quality features in the exampleembodiment.

Content Richness Fidelity

The content richness fidelity feature of the example embodiment measuresthe fidelity of the richness of test video's content with respect to theoriginal reference (undistorted) video. This content richness fidelityfeature gives higher values for test video which has better fidelity incontent richness with respect to the original reference (undistorted)video. This feature closely correlates with human perceptual responsewhich tends to assign better subjective ratings to more lively and morecolourful images and lower subjective ratings to dull and unlivelyimages.

The image content richness fidelity feature for each individual frame oftime interval t of the video can be defined asF_(RF)(t) = 𝕖^((0.25)R_(d)(t)/R_(o)(t)), and${{R(t)} = {- {\sum\limits_{{p{(i)}} \notin 0}{{p(i)}{\log_{e}\left( {p(i)} \right)}}}}},$

where the subscript o refers to the original video sequence, thesubscript d refers to the test video sequence, tε[1,n], n is the totalnumber of image-frames in the video sequence, and:${p(i)} = {\frac{N(i)}{\sum\limits_{\forall i}{N(i)}}.}$

Here, i is a particular colour (either the luminance or the chrominance)value, iε[0,255], N(i) is the number of occurrence of i in the imageframe, and p(i) is the probability or relative frequency of i appearingin the image frame.

FIG. 5 shows a flow chart of the process involved in computing thecontent richness fidelity in the example embodiment. At steps 502 and503, the probability of each colour is determined for the current frameof a reference video and of a test video respectively.

At steps 504 and 505, the product of the probability and the Log ofprobability for each colour is determined for the reference video andtest video respectively.

At steps 506 and 507, the products are summed for the reference videoand the test video respectively.

At steps 508 and 509, the negative of the summed products is output forthe reference video and the test video respectively.

At step 510, the content richness fidelity feature is computed andoutput.

Block-Fidelity

The block-fidelity feature of the example embodiment measures the amountof distortion at block-boundaries in the test video when compared withrespect to the original reference (undistorted) video. Theblock-fidelity feature should give lower values when distortion atblock-boundaries in the test video is more severe and higher values whendistortion is very low or does not exist in the test video (whencompared to the original reference (undistorted) video).

The blocking effect, and its propagation through reconstructed videosequences, is one of the significant coding artefacts that often occurin video compression. The blocking effect is also a source of a numberof other types of reconstruction artifacts, such as stationary areagranular noise.

The block-fidelity measure for each individual frame of the video isdefined as follows:F _(BF)(t)=e ^((0.25){(B) _(d) ^(h) ^((t)+B) _(d) ^(v) ^((t))−(B) _(o)^(h) ^((t)+B) _(o) ^(v) ^((t))|}/( B) _(o) ^(h) ^((t)+B) _(o) ^(v)^((t))),

where the subscript o refers to the original video sequence, d refers tothe test video sequence, and:${{B^{h}(t)} = {\frac{1}{H\left( {\left\lfloor {W/4} \right\rfloor - 1} \right)}{\sum\limits_{y = 1}^{H}{\sum\limits_{x = 1}^{{\lfloor{W/4}\rfloor} - 1}{{d^{h}\left( {{4x},y,t} \right)}}}}}},{and}$d^(h)(x, y, t) = I(x + 1, y, t) − I(x, y, t).

I(x,y,t) denotes the colour value of the input image frame I at pixellocation (x,y) and time interval t, H is the height of the image, W isthe width of the image, and where xε[1,W] and yε[1,H].

Similarly,${{B^{v}(t)} = {\frac{1}{W\left( {\left\lfloor {H/4} \right\rfloor - 1} \right)}{\sum\limits_{y = 1}^{{\rfloor{\lfloor{H/4}\rfloor}} - 1}{\sum\limits_{x = 1}^{W}{{d^{v}\left( {x,{4y},t} \right)}}}}}},{and}$d^(v)(x, y, t) = I(x, y + 1, t) − I(x, y, t).

Literally, B^(h) and B^(v) are computed from block boundariesinterspaced at 4 pixels apart in horizontal and vertical directionsrespectively.

FIG. 4 shows a flow chart of computing the block-fidelity feature in theexample embodiment. At steps 402 and 403 the differences in colourvalues for blocks' boundaries in a first direction are determined for acurrent frame of the reference video and the test video respectively. Atsteps 404 and 405 an average difference for blocks' boundaries in thefirst direction across a second direction is determined for thereference video and the test video respectively. At steps 406 and 407, acomponent for the first direction is determined for the reference videoand the test video respectively.

At steps 408 and 409, differences in colour values for blocks'boundaries in the second direction are determined for the referencevideo and the test video respectively. At steps 410 and 411, an averagedifference for blocks' boundaries in the second direction across thefirst direction is determined for the reference video and the test videorespectively. At steps 412 and 413, a component for the second directionis determined for the reference video and the test video respectively.At step 414, the block-fidelity feature is computed and output.

Distortion-Invisibility

The distortion-invisibility feature in the example embodiment measuresthe average amount of distortion that may be visible at each pixel withrespect to a visibility threshold and gives higher values for lowervisible distortions and lower values for higher visible distortions. Thedistortion-invisibility measure, D(t), for each frame of the video isgiven by:${D(t)} = \left\{ {{1/\frac{1}{WH}}{\sum\limits_{x = 1}^{W}{\sum\limits_{y = 1}^{H}\left\lbrack {\gamma_{1} + \frac{\hat{d}\left( {x,y,t} \right)}{\gamma_{2} + {T\left( {x,y,t} \right)}}} \right\rbrack}}} \right\}$

T(x,y,t) is the visibility threshold at a particular pixel location(x,y) and time interval t, W and H are width and height of the videoframe respectively, 1≦x≦W, 1≦y≦H, γ₁ is included for introducinglinearity into the equation, and γ₂ prevents division by zero in theequation.

Also: ${\hat{d}\left( {x,y,t} \right)} = \left\{ \begin{matrix}{d\left( {x,y,t} \right)} & {{{if}\quad{d\left( {x,y,t} \right)}} \geq {T\left( {x,y,t} \right)}} \\0 & {otherwise}\end{matrix} \right.$

where d(x,y,t) is the difference between a frame in the test video I_(d)and the reference video I_(o) at the same pixel location (x,y) and timet and is defined as:d(x,y,t)=|I _(o)(x,y,t)−I _(d)(x,y,t)|

Here, I_(o)(x,y,t) denotes a pixel at location (x,y) at frame t of theoriginal video sequence while I_(d)(x,y,t) denotes a pixel at location(x,y) at frame t of the test video sequence.

The visibility threshold T is given by:T(x,y,t)=(T ¹(x,y,t)+T ^(s)(x,y,t)−C ^(ls) .min{T ¹(x,y,t),T^(s)(x,y,t)})T ^(m)(x,y,t)

The visibility threshold at a particular pixel located at position (x,y)and time t, denoted T(x,y,t), provides an indication of the maximumallowable distortions at a particular pixel in the image frame whichwill still not be visible to human eyes. Here, T¹(x,y,t), T^(s)(x,y,t)and T^(m)(x,y,t) can be regarded as effects due to colour masking,spatial-textural masking, and temporal masking respectively at aparticular pixel located at position (x,y) in the image frame at timeinterval t in the video sequence, while C^(is) is a constant. The threemasking effects interact in a manner as described by the above equationin order to provide a visibility threshold required for this objectivevideo quality measurement method. Literary, visibility threshold is madeup of additive-cum-weak-cancellation interactions of both the colourmasking term T¹ and the spatial-textural masking term T^(s)(mathematically expressed asT¹(x,y,t)+T^(s)(x,y,t)−C^(ls).min{T¹(x,y,t),T^(s)(x,y,t)}), followed bya multiplicative interaction with the temporal masking term T^(m).

Masking is an important visual phenomenon and can explain why similarartifacts are disturbing in certain regions (such as flat regions) of animage frame while they are hardly noticeable in other regions (such astextured regions). In addition, similar artifacts in certain regions ofdifferent video sequences displaying different temporal characteristicswill appear as disturbing in a particular video sequence but not inanother. In the example embodiment, these visual phenomenon have beenmodelled using colour masking, spatial-textural masking, and temporalmasking which will be further described in the below section.

The temporal masking T^(m) attempts to emulate the effect of humanvision's characteristic of being able to accept higher video-framedistortion due to larger temporal changes and can be derived as follow:${T^{m}\left( {x,y,t} \right)} = {{\mathbb{e}}^{f_{s} \cdot f_{r}}\left\{ \begin{matrix}T_{2}^{m} & {{{if}\quad{{d_{f}\left( {x,y,t} \right)}}} \leq T_{3}^{m}} \\{{T_{1}^{m}\left( {z_{2}^{({1 - {({{({L_{m} - {d_{f}{({x,y,t})}}})}/{({L_{m} - T_{3}^{m}})}})}})} - 1} \right)} + T_{2}^{m}} & {{{if}\quad{d_{f}\left( {x,y,t} \right)}} < {- T_{3}^{m}}} \\{{T_{o}^{m}\left( {z_{1}^{({1 - {({{({L_{m} - {d_{f}{({x,y,t})}}})}/{({L_{m} - T_{3}^{m}})}})}})} - 1} \right)} + T_{2}^{m}} & {Otherwise}\end{matrix} \right.}$

where d_(f)(x,y,t) is the inter-frame difference at a particular pixellocation (x,y) in time t between a current frame I_(o)(x,y,t) and aprevious coded frame I_(o)(x,y,t−f_(f)/fr) (assuming that frames thathave been coded at below full frame rate have been repeated in thisvideo sequence) and is mathematically expressed as:d _(f)(x,y,t)=I _(o)(x,y,t−f _(f)/f _(r))

Here, f_(r) is the frame rate at which the video has been compressed,f_(f) is the full frame rate, f_(s) is a scaling factor, while L_(m),T_(o) ^(m), T₁ ^(m), T₂ ^(m), T₃ ^(m), z₁, and z₂ are constants used todetermine the exact profile of the temporal masking.

The colour masking attempts to emulate the effect of human vision'scharacteristic of being able to accept higher video-frame distortionwhen the background colour is above or below a certain mid-levelthreshold and can be derived as follow:${T^{l}\left( {x,y,t} \right)} = \left\{ {\begin{matrix}{{T_{1}^{l}\left( {v_{2}^{({1 - {({{({{\lfloor{L_{l}/2}\rfloor} + {b{({x,y,t})}}})}/{({L_{l} - r})}})}})} - 1} \right)} + T_{2}^{l}} & {{{if}\quad{b\left( {x,y,t} \right)}} \leq \left( {L_{l} - r} \right)} \\{{T_{o}^{l}\left( {v_{1}^{({1 - {({{({{\lfloor{3{L_{l}/2}}\rfloor} - {b{({x,y,t})}}})}/{({L_{l} - r})}})}})} - 1} \right)} + T_{2}^{l}} & {{{if}\quad{b\left( {x,y,t} \right)}} > \left( {L_{l} + r} \right)} \\T_{2}^{l} & {Otherwise}\end{matrix}.} \right.$

Here, T_(o) ^(l), T₁ ^(l) and T₂ ^(l), L_(l), r, v₁, and V₂ areconstants used to determine the exact profile of the colour masking.

The spatial-textural masking attempts to emulate the effect of humanvision's characteristic of being able to accept higher video-framedistortion when the particular point has richer texture or spatialprofile and can be derived as follow:T ^(s)(x,y,t)=(m(x,y,t)b(x,y,t)α₁ +M(x,y,t)α₂ +b(x,y,t)α₃+α₄)W(x,y,t).

Here, α₁, α₂, α₃, and α₄ are constants used to determine the exactprofile of the spatial-textural masking.

In the spatial-textural masking, m(x,y,t) is the average of the averagecolour value g_(k)(x,y) in four different orientations and it attemptsto capture the textural characteristic of the small local region centredon pixel (x,y,t) and can be mathematically written as:${m\left( {x,y,t} \right)} = {\frac{1}{4}{\sum\limits_{k == 1}^{4}{{{g_{k}\left( {x,y,t} \right)}}.}}}$

Also, g_(k)(x,y,t) is the average colour value around a pixel located atposition (x,y) of a frame in the original reference video sequence attime interval t and is computed by convolving a 7×7 mask, G_(k) withthis particular frame in the original reference video sequence.Mathematically, g_(k)(x,y,t) can be expressed as:${g_{k}\left( {x,y,t} \right)} = {\frac{1}{19}{\sum\limits_{m = {- 3}}^{3}{\sum\limits_{n = {- 3}}^{3}{{I_{o}\left( {{x + m},{y + n},t} \right)} \cdot {{G_{k}\left( {{m + 4},{n + 4},t} \right)}.}}}}}$

The four 7×7 masks, G_(k), for k={1,2,3,4}, shown in FIG. 2 at numerals202, 204, 206 and 208 respectively, are four differently orientedgradient masks used to capture the strength of the gradients around apixel located at position (x,y,t).

Here, b(x,y,t) is the average background colour value around a pixellocated at position (x,y) of a frame in the original reference videosequence at time interval t and is computed by convolving a 7×7 mask, B,shown in FIG. 1 at numeral 102, with this particular frame in theoriginal reference video sequence. Mathematically, b(x,y,t) can beexpressed as:${b\left( {x,y,t} \right)} = {\frac{1}{40}{\sum\limits_{m = {- 3}}^{3}{\sum\limits_{n = {- 3}}^{3}{{f\left( {{x + m},{y + n},t} \right)} \cdot {{B\left( {{m + 4},{n + 4},t} \right)}.}}}}}$

The 7×7 mask, B, acts like a low-pass filter when operated on a pixellocated at position (x,y,t).

In addition, W(x,y,t) is an edge-adaptive weight of the pixel atlocation (x,y) of a frame in the original reference video sequence attime interval t, and it attempts to reduce the spatial-textural maskingat edge locations because artifacts that are found on essential edgelocations tend to reduce the visual quality of the image frame. Thecorresponding edge-adaptive weight matrix W, obtained by convolving Êwith a 7×7 low-pass filter g shown in FIG. 3 at numeral 302, is givenby:W=Ê*gÊ=1−(0.9E)

where * is a convolution operator, E is the edge matrix of the originalimage frame computed with any edge detection technique and containsvalues of 1 and 0 for edge and non-edge pixels respectively.

FIG. 6 shows a flow chart illustrating the process for computingdistortion-invisibility in the example embodiment. At steps 602 and 604,colour masking and spatial-textural masking respectively are performedon the current frame of the reference video. At step 606, temporalmasking is performed between the current frame of the reference videoand the previous frame of the reference video. At step 608, a framedifference is determined between the current frame of the test video andthe current frame of the reference video.

At step 610, masking interactions are performed based on T¹, T^(s), andT^(m) from steps 602, 604, and 606 to produce a visibility threshold T.At step 612, the distortion-invisibility D is computed based on anoutput from the masking interactions step 610 and d from the framedifference determination step 608, and the distortion-invisibility D isoutput.

Video Quality Measurement Method

The overall objective video quality rating in the example embodiment fora video sequence, Q, is given by averaging the objective video qualityrating for each frame q(t) and can be expressed as:${Q = {\sum\limits_{t = {i \cdot {({f_{f}/f_{r}})}}}^{n}{\left\lbrack {q(t)} \right\rbrack/n_{t}}}},{i = 1},2,\ldots$

where n is the total number of frames in the original video sequence, ntis the total number of coded video sequences (which is different if thevideo is coded at below the full frame rate) and is given by:n _(t) =n/(f _(f) /f _(r)),

and q(t) is the objective video quality rating for each frame, definedas follows:q(t)=D(t)·F _(BF)(t)

FIG. 7 shows a block diagram of the video quality measurement system 700of the example embodiment. The system 700 comprises a block-fidelityfeature extraction module 702, a content richness fidelity featureextraction module 704, a distortion-invisibility feature extractionmodule 706, and a video quality model module 708. The current frame ofthe test video is input to each of the modules 702, 704 and 706. Thecurrent frame of the reference video is input to each of the modules702, 704 and 706. The previous frame of the reference video is input tomodule 206.

In video quality module 208, a video quality measure of the currentframe is generated and output, based on the respective outputs ofmodules 702, 704 and 706.

For colour video sequences, the overall objective video rating for acolour video sequence, Q_(c), is given by a weighted averaging of theobjective video quality rating for each colour's q_(j)(t), for j=1, . .. ,a, where a is the maximum number of colour components, and can beexpressed as:${Q_{c} = {\sum\limits_{j = 1}^{a}{\alpha_{j}\left( {\sum\limits_{t = {i \cdot {({f_{f}/f_{r}})}}}^{n}{\left\lbrack {q_{j}(t)} \right\rbrack/n_{t}}} \right)}}},{i = 1},2,\ldots$

where al denotes the weightage for each colour components.

FIG. 8 shows a flow chart illustrating the process for computing theoverall video quality measure for a test video sequence in the exampleembodiment. At step 802, the test video sequence of images is input. Atstep 804, a counter i is set to zero. At step 806, the counter i isincremented by one, At step 808, the content richness fidelity isdetermined as previously described with reference to FIG. 5. At step810, the block-fidelity is determined as previously described withreference to FIG. 4. At step 812, the distortion-invisibility isdetermined as previously described with reference to FIG. 6. At step814, the video quality measure is determined for frame i, as previouslydescribed with reference to FIG. 7.

At step 816, it is determined whether or not the input video sequence orclip has finished. If not, the process loops back to step 806. If theclip has finished, the video quality measure Q is determined for theclip at step 818, and output at step 820.

METRIC PARAMETERIZATION

This step is used to derive the values of the various parameters beingused in the video quality measurement model in the example embodiment.

Parameters Optimization Method

The parameters of the video quality measurement model in the exampleembodiment have been obtained by optimising them with respect to theircorrelation with human visual subjective ratings.

The parameters' optimization method used is a modified version of theHooke & Jeeves' pattern search method, due to its robustness, simplicityand efficiency. Reference is made to Hooke R., Jeeves T. A., “DirectSearch” solution of numerical and statistical problems, Journal of theAssociate Computing Machinery, Vol. 8, 1961, pp. 212-229. The algorithmperforms well in curve fitting and linear equations solving and has beensuccessfully applied to many applications.

Application of Pattern Search Strategy

Hooke & Jeeve's method has been used here due to its use of simplestrategy rather than complex tactics. It makes two types of move:exploratory moves and pattern moves. Each point with success is termedas base point. So the process proceeds from base point to base point.

An exploratory move is designed to acquire knowledge concerning thelocal behaviour of the objective function. This knowledge is inferredentirely from the success or failure of the exploratory moves andutilized by combining it into a ‘pattern’, which indicates a probabledirection for a successful move. For simplicity, the exploratory moveshere are taken to be simple, that is, at each move only the value of asingle coordinate is changed. From a base point, a pattern move isdesigned to utilize the information acquired in the previous exploratorymoves, and accomplish the minimization of the objective function bymoving in the direction of the established ‘pattern’. On the intuitivebasis, the pattern move from the base point duplicates the combinedmoves from the previous base point. A sequence of exploratory moves thenfollows and may result in a success or a failure.

In the case of a success, the final point reached becomes a new basepoint and a further pattern is conducted. The length of the pattern movemay reach many times of the size of the base step.

In the case of failure, the pattern move is abandoned. From the basepoint, another series of exploratory moves are made in order toestablish an entirely new pattern.

For any given value of step size, the search will reach an impasse whenno more new base point is found. The step size can be reduced tocontinue the search but should be kept above a practical limit imposedby the application. The exploratory moves will be stopped when the stepsize is sufficiently small, and the final termination of the search ismade when no more base point to be found, or the optimization result isgood enough.

Parameters Optimization Based on Subjective Ratings

The parameters of the video quality measurement model in the exampleembodiment have been obtained by optimising them with respect to theircorrelation with human visual subjective ratings. The correlationmeasure used in the optimisation is selected to be Pearson correlationof the logistic fit. Table 1 summarizes the test conditions for theoriginal video sequences. Each of the video sequences consists of 250frames. TABLE 1 Test Conditions for original video sequences Bit Framerate Rate 7.5 fps 15 fps 30 fps 24 Kbps QCIF QCIF — 48 Kbps QCIF QCIFQCIF 64 Kbps Both QCIF Both QCIF QCIF and CIF and CIF 128 Kbps CIF CIFBoth QCIF and CIF 384 Kbps — — CIF

Here, only the QCIF (Quarter Common Intermediate Format) video sequenceshave been used for obtaining the required optimised parameters of thevideo quality measurement model and in addition, only decoded framesinter-spaced at 4 frames interval are being used for the optimisationprocess in order to reduce the amount of data used for training andspeed up the optimisation process. After the required parameters havebeen obtained using the above-mentioned Hooke and Jeeve's parameterssearch method, the video quality measurement method is tested on all theimage frames of the 90 test video sequences in the test data set.

To speed up the optimisation process, a 2-step optimisation process hasbeen utilized here. In the first step, the parameters of visibilitythreshold, luminance and spatial-textural masking, namely C^(ls), f_(s),T₀ ^(m), T₁ ^(m), T₂ ^(m), T₃ ^(m), Z₁, Z₂, α₁, α₂, α₃, and a₄ have beenoptimised. In the second step, the temporal masking parameters, namelyT₀ ^(l), T₁ ^(l) and T₂ ^(l), r, v₁, and v₂ are then optimised (usingthe already optimised parameters obtained in the first step). Finally,the above process is repeated again to ensure that the final optimisedparameters estimated are indeed the best parameters obtainable.

The video quality measurement method in the example embodiment has beentested on the test data set consisting of 90 video sequences that havebeen obtained by subjecting 12 original video sequences to variouscompression bitrates and frame rates (see Table 1). The performance ofthe proposed metric is measured with respect to the subjective ratingsof these test video sequences that have been obtained by subjectivevideo quality experiment.

As mentioned before, the test video sequences are generated bysubjecting 12 different original undistorted CIF (Common IntermediateFormat) and QCIF (Quarter Common Intermediate Format) video sequences(“Container”, “Coast Guard”, “Japan League”, “Foreman”, “News”, and“Tempete”) to H.26L video compression with different bit rates and framerates.

Table 2 shows the results of the proposed method with respect to PSNR.The upper bound and lower bound of Pearson correlation were obtainedwith a confidence interval of 95%. TABLE 2 Results of video qualitymeasurement method of the example embodiment and that given by PSNRPearson- Upper Lower Spearman- Correlation Bound Bound Correlation PSNR0.701 0.793 0.578 0.676 Example 0.897 0.931 0.848 0.902 embodiment

FIG. 9 shows the scatterplot 902 of subjective ratings (y-axis) versusthe rating values obtain using a prior art visual quality metric, moreparticular PSNR values(x-axis), while FIG. 10 shows the scatterplot ofsubjective ratings (y-axis) versus the video quality ratings (x-axis)estimated utilising the video quality measurement method of an exampleembodiment of the present invention.

In FIGS. 9 and 10, the middle solid lines (904, 1004) portrays thelogistic fit using the above-mentioned 4-parameter cubic polynomial,while the upper dotted curves (906, 1006) and the lower dotted curves(908, 1008) portray the upper bound and lower bound respectivelyobtained with a confidence interval of 95%.

The method and system of the example embodiment can be implemented on acomputer system 1100, schematically shown in FIG. 11. It may beimplemented as software, such as a computer program being executedwithin the computer system 1100, and instructing the computer system1100 to conduct the method of the example embodiment.

The computer system 1100 comprises a computer module 1102, input modulessuch as a keyboard 1104 and mouse 1106 and a plurality of output devicessuch as a display 1108, and printer 1110.

The computer module 1102 is connected to a computer network 1112 via asuitable transceiver device 1114, to enable access to e.g. the Internetor other network systems such as Local Area Network (LAN) or Wide AreaNetwork (WAN).

The computer module 1102 in the example includes a processor 1118, aRandom Access Memory (RAM) 1120 and a Read Only Memory (ROM) 1122. Thecomputer module 1102 also includes a number of Input/Output (110)interfaces, for example I/O interface 1124 to the display 1108, and I/Ointerface 1126 to the keyboard 1104.

The components of the computer module 1102 typically communicate via aninterconnected bus 1128 and in a manner known to the person skilled inthe relevant art.

The application program is typically supplied to the user of thecomputer system 1100 encoded on a data storage medium such as a CD-ROMor floppy disk and read utilising a corresponding data storage mediumdrive of a data storage device 1130. The application program is read andcontrolled in its execution by the processor 1118. Intermediate storageof program data maybe accomplished using RAM 1120.

It will be appreciated by a person skilled in the art that numerousvariations and/or modifications may be made to the present invention asshown in the specific embodiments without departing from the spirit orscope of the invention as broadly described. The present embodimentsare, therefore, to be considered in all respects to be illustrative andnot restrictive.

1. A method of measuring a quality of a test video stream, the methodcomprising: measuring a content richness fidelity feature of the testvideo stream based on occurrences of color values in image frames of thetest video stream; measuring a block-fidelity. feature of the test videostream based on distortion at block-boundaries in the image frames ofthe test video stream; measuring a distortion-invisibility feature ofthe test video stream based on distortion at pixels of the image framesof the test video stream; and determining a quality rating for the testvideo stream based on the content richness fidelity feature, theblock-fidelity feature and the distortion-invisibility feature measured.2. The method as claimed in claim 1, wherein the measuring of thecontent richness fidelity feature of the test video stream is based on asum of products for each colour and each frame, wherein each productcomprises the product of a probability of occurrence of said each colourin said each image frame, and a logarithmic of said probability.
 3. Themethod as claimed in claims 1 or 2, wherein the measuring of theblock-fidelity feature of the test video stream is based on distortionat 4-pixel sub-block boundaries.
 4. The method as claimed in any one ofclaims 1 to 3, wherein the measuring of the content richness fidelityfeature is based on occurrences of color values in corresponding imageframes of the test video stream and an original video stream from whichthe test video stream has been derived.
 5. The method as claimed inclaim 4, wherein the measuring of the block-fidelity feature of the testvideo stream is based on distortion at block-boundaries in correspondingimage frames of the test video stream and the original video stream fromwhich the test video stream has been derived.
 6. The method as claimedin any one of claims 1 to 3, wherein the measuring of the block-fidelityfeature of the test video stream is based on distortion atblock-boundaries in corresponding image frames of the test video streamand an original video stream from which the test video stream has beenderived.
 7. The method as claimed in any one of claims 1 to 6, whereinthe measuring of the distortion-invisibility feature of the test videostream is based on a visibility threshold value.
 8. The method asclaimed in claim 7, wherein the visibility threshold value is based onone or more masking effects determined for the test video stream and theoriginal video stream.
 9. The method as claimed in claim 8, wherein themasking effects comprise one or more of a group consisting of colourmasking, temporal masking and spatial-textural masking.
 10. The methodas claimed in claim 9, wherein the measuring of thedistortion-invisibility feature of the test video stream is based ondistortion at pixels of corresponding image frames-of the test videostream and the original video stream from which the test video streamhas been derived.
 11. The method as claimed in any one of claims 1 to 8,wherein the measuring of the distortion-invisibility feature of the testvideo stream is based on distortion at pixels of corresponding imageframes of the test video stream and the original video stream from whichthe test video stream has been derived.
 12. The method as claimed inclaim 11, wherein the measuring of the distortion-invisibility featureof the test video stream is based on processing of current and previousimage frames of the test video stream and the original video stream. 13.A system for measuring a quality of a test video stream, the systemcomprising: means for measuring a content richness fidelity feature ofthe test video stream based on occurrences of color values in imageframes of the test video stream; means for measuring a block-fidelityfeature of the test video stream based on distortion at block-boundariesin the image frames of the test video stream; means for measuring adistortion-invisibility feature of the test video stream based ondistortion at pixels of the image frames of the test video stream; andmeans for determining a quality rating for the test video stream basedon the content richness fidelity feature, the block-fidelity feature andthe distortion-invisibility feature measured.
 14. A system for measuringa quality of a test video stream, the system comprising: a colorprocessor measuring a content richness fidelity feature of the testvideo stream based on occurrences of color values in image frames of thetest video stream; a distortion processor measuring a block-fidelityfeature of the test video stream based on distortion at block-boundariesin the image frames of the test video stream and measuring adistortion-invisibility feature of the test video stream based ondistortion at pixels of the image frames of the test video stream; and aquality rating processor determining a quality rating for the test videostream based on the content richness fidelity feature, theblock-fidelity feature and the distortion-invisibility feature measured.15. A computer readable data storage medium having stored thereonprogram code means for instructing a computer to execute a method ofmeasuring a quality of a test video stream, the method comprising:measuring a content richness fidelity feature of the test video streambased on occurrences of color values in image frames of the test videostream; measuring a block-fidelity feature of the test video streambased on distortion at block-boundaries in the image frames of the testvideo stream; measuring a distortion-invisibility feature of the testvideo stream based on distortion at pixels of the image frames of thetest video stream; and determining a quality rating for the test videostream based on the content richness fidelity feature, theblock-fidelity feature and the distortion-invisibility feature measured.